Identification of Non-Lexicon Non-Slang Unigrams in Body-enhancement Medicinal UBE

ثبت نشده
چکیده

Email has become a fast and cheap means of online communication. The main threat to email is Unsolicited Bulk Email (UBE), commonly called spam email. The current work aims at identification of unigrams in more than 2700 UBE that advertise body-enhancement drugs. The identification is based on the requirement that the unigram is neither present in dictionary, nor is a slang term. The motives of the paper are many fold. This is an attempt to analyze spamming behaviour and employment of wordmutation technique. On the side-lines of the paper, we have attempted to better understand the spam, the slang and their interplay. The problem has been addressed by employing Tokenization technique and Unigram BOW model. We found that the non-lexicon words constitute nearly 66% of total number of lexis of corpus whereas non-slang words constitute nearly 2.4% of non-lexicon words. Further, non-lexicon non-slang unigrams composed of 2 lexicon words, form more than 71% of the total number of such unigrams. To the best of our knowledge, this is the first attempt to analyze usage of non-lexicon non-slang unigrams in any kind of UBE. Keywords—Body Enhancement, Lexicon, Medicinal, Slang, Unigram, Unsolicited Bulk e-mail (UBE)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of Most Frequently Occurring Lexis in Body-enhancement Medicinal Unsolicited Bulk e-mails

e-mail has become an important means of electronic communication but the viability of its usage is marred by Unsolicited Bulk e-mail (UBE) messages. UBE consists of many types like pornographic, virus infected and 'cry-for-help' messages as well as fake and fraudulent offers for jobs, winnings and medicines. UBE poses technical and socio-economic challenges to usage of e-mails. To meet this cha...

متن کامل

Identifying Potential Adverse Drug Events in Tweets Using Bootstrapped Lexicons

Adverse drug events (ADEs) are medical complications co-occurring with a period of drug usage. Identification of ADEs is a primary way of evaluating available quality of care. As more social media users begin discussing their drug experiences online, public data becomes available for researchers to expand existing electronic ADE reporting systems, though non-standard language inhibits ease of a...

متن کامل

Development of Affective Lexicon for Spanish with Mexican Slang Expressions

Nowadays exists a growing interest in the automatic extraction of subjective expressions (opinions, emotions and feelings) in texts. To identify the semantic orientation of a text, it is assumed that the occurrence of expressions that belong to some emotional category can be regarded as evidence that there is an affective state. Based on this assumption, we create an affective lexicon, consisti...

متن کامل

People Re-identification in Non-overlapping Field-of-views using Cumulative Brightness Transform Function and Body Segments in Different Color Spaces

Non-overlapping field-of-view (FOV) cameras are used in surveillance system to cover a wider area. Tracking in such systems is generally performed in two distinct steps. In the first step, people are identified and tracked in the FOV of a single camera. In the second step, re-identification of the people is carried out to track them in the whole area under surveillance. Various conventional fea...

متن کامل

Enhancement of Tropane Alkaloid Production among Several Clones and Explants Types of Hairy Root of Atropa belladonna L.

Agrobacterium rhizogenes (pRi), a causative agent of hairy root disease, effectively induces hairy root formation in a variety of plant species. In our study four bacterial strains AR15834, A4, 9435 and C318 and three explants types leaf, stems and roots, were examined. Hairy roots were induced from roots, stems and leaf explants. The highest transformation efficiency of 77% was achieved by usi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012